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Abstract. We consider the evolution of populations under the joint action of 
mutation and differential reproduction, or selection. The population is modelled 
as a finite-type Markov branching process in continuous time, and the associated 
genealogical tree is viewed both in the forward and the backward direction of time. 
The stationary type distribution of the reversed process, the so-called ancestral 
distribution, turns out as a key for the study of mutation-selection balance. This 
balance can be expressed in the form of a variational principle that quantifies the 
respective roles of reproduction and mutation for any possible type distribution. It 
shows that the mean growth rate of the population results from a competition for 
a maximal long-term growth rate, as given by the difference between the current 
mean reproduction rate, and an asymptotic decay rate related to the mutation 
process; this tradeoff is won by the ancestral distribution. 

We then focus on the case when the type is determined by a sequence of letters 
(like nucleotides or matches/mismatches relative to a reference sequence), and we 
ask how much of the above competition can still be seen by observing only the 
letter composition (as given by the frequencies of the various letters within the 
sequence) . 

If mutation and reproduction rates can be approximated in a smooth way, 
the fitness of letter compositions resulting from the interplay of reproduction and 
mutation is determined in the limit as the number of sequence sites tends to 
infinity. 

Our main application is the quasispecies model of sequence evolution with mu- 
tation coupled to reproduction but independent across sites, and a fitness function 
that is invariant under permutation of sites. In this model, the fitness of letter 
compositions is worked out explicitly. In certain cases, their competition leads to 
a phase transition. 
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1. Introduction 

Evolution is often understood as an optimization process of some kind, and 
there is a long tradition to consider evolutionary models, particularly those 
from population genetics, from a variational perspective. The most popular 
result in this context is known as the Fundamental Theorem of Natural 
Selection (FTNS). In its simplest form it states that, in the deterministic 
selection equation for a single locus in continuous time, mean fitness can 
only increase along trajectories (i.e., it is a Lyapunov function), and the 
rate of this increase equals the variance in fitness, cf. [5, Ch. 1.10.3]. More 
sophisticated versions in the context of quantitative genetics and multiple 
loci, along with a general discussion of optimality principles for the selection 
equation, are discussed in [5, Ch. II.6.3-II.6.6] and [11, Ch. 2.9, 7.4.5 and 
7.4.6]; see also [8]. 

If, rather than selection alone, the joint dynamics of selection and mu- 
tation is considered, results become sparse. The FTNS may be generalized 
to housc-of-cards mutation (i.e., mutation rates are independent of the par- 
ent type), see [1] and [19]. If mutation is reversible, a Lyapunov function is 
available for a certain L^-renormalized version of the dynamics, but not for 
the original mutation-selection equation [33]. 

The above approaches refer to the genetic (or, more generally, type) 
composition at the population level. In contrast, this article is concerned 
with a variational principle in mutation-selection models (and closely re- 
lated branching processes) from the point of view of individual lineages 
through time, their ancestry and genealogy. This principle is related to the 
(stochastic) processes that take place along such lines of descent, with a 
special emphasis on the relation between the present and the past. We will, 
however, not include genetic drift (i.e., resampling) into our models; there- 
fore, our backward point of view differs from that of the coalescent process 
(see [17] for a recent review of this area). 

The paper is organized as follows. In Section 2, we will set up our 
modcl(s) and recapitulate a few fundamental facts. Section 3 provides an 
informal preview of the results that will be detailed (and proved) in the re- 
mainder of the article. Section 4 will develop the lineage aspect that will be 
required furtheron. Looking at the mutation process along individual lines, 
we will obtain a fairly general variational principle (Section 5, Thm. 1), 
which quantifies the tradeoff between the mean reproduction rate along a 
line and the asymptotic rate at which it is lost; it further implies a con- 
nection between the type processes that emerge in the forward and back- 
ward directions of time. In Sections 6 and 7, we will specialize on the case 
where types arc scqiiences over a finite alphabet. If mutation is independent 
and fitness is additive across sites, the original high-dimensional variational 
principle may be reduced to a simpler, low-dimensional one (Section 6.3, 
Thm. 2) . The same holds asymptotically if mutation rates and fitness func- 
tion allow for a suitable smooth approximation when the number of sites 
gets large (Section 7, Thms. 3, 4 and 5). The corresponding approximate 
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maximum principle will be derived explicitly for the quasispecies model of 
sequence evolution (Section 8, Thm. 6). 

The paper ties together, unifies and generalizes various aspects that 
have appeared in previous publications. Special cases of the low-dimensional 
maximum principle were first described in [18], and applied to concrete 
examples. An extension appeared in [3]; it relies on methods from linear 
algebra and asymptotic analysis, but makes no connection to the stochastic 
processes on individual lines, nor does it include worked examples. The 
connection to the backward point of view relies on earlier work on branching 
processes [22,23] and was investigated in [18] and [15]. These results will 
reappear here as parts of a larger picture. 

2. Models and basic facts 

2.1. Models 

Consider a finite set of types S (with [^l > 1) and a population of individ- 
uals, each of which carries one of these types. (We think of individuals as 
haploid, and of types as alleles.) 

2.1.1 The parallel mutation-reproduction model. Let us start with the 
most basic mutation-reproduction model in which mutation and reproduc- 
tion occur in parallel, that is, independently. As depicted in Fig. 1, an 
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Fig. 1. The parallel mutation-reproduction model. 



individual of type i € S may, at every instant in continuous time, do either 
of three things: It may split, i.e., produce a copy of itself (this happens at 
birth rate Bi > 0), it may die (at rate Di > 0), or it may mutate to type 
j (j i) (at rate Uij > 0). Different meanings may be associated with this 
verbal description. Probabilists will take it to mean a multi-type Markov 
branching process in continuous time (see [2, Ch. V.7], or [25, Ch. 8] for 
a general overview). That is, an z-individual waits for an exponential time 
with parameter Ai — Bi -\- Di -\- '^j.j-^i Uij , and then dies, splits or mutates 
to typo j ^ i with probabilities Bi/ Ai, Di/ Ai, and Uij / Ai, respectively. The 
number of individuals of type j at time t, Zj{t) G Z>o := {0, 1,2...}, is 
a random variable; the collection Z{t) = [Zj{t))j^g is a random vector. 
The corresponding expectation is described by the first-moment generator 
A = U -\- R. Here, U is the Markov generator U = {Uij)ij^s, where the 
mutation rates Uij for j ^ i are complemented by Ua := — J2j-j:^i ^ij 
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for all i G S. Further, R :— diag{i?i | i € S}, where Ri := Bi — Di is 
the net reproduction rate (or Malthusian fitness). More precisely, we have 
W{Zj(t)) = (e*'^)ij, where W{Zj{t)) is the expected number of j individuals 
at time f in a population started by a single i-individual at time 0. 

2.1.2 Deterministic aspects. Ignoring stochastic effects and focussing on 
the mean behaviour of the population, one often considers the deterministic 
mutation-reproduction model 

m=yit)A y{o)=yo, (i) 

where y{t) — {yi{t))ies is the row vector associating to each type i its 
abundance yi{t) E R>o (i.e., the size of the subpopulation of type i). As 
y{t) = yoe*^, the deterministic model describes the expectation of the cor- 
responding branching process, provided the initial condition is chosen ac- 
cordingly. 

However, the independent reproduction of individuals as implied so far is 
unrealistic for large populations. They usually experience density regulation; 
in the simplest case, this is modelled by an additional death term j(t) > 0, 
that is, Di is replaced by + ^{t) (for all i G S), where j{t) may depend 
on time (maybe through total population size), but not on the type. Then, 
of course, (1) generalizes to 

y{t) = y{t){A-j{t)I), (2) 

where I is the identity matrix. In theoretical ecology, a wide variety of mod- 
els is in use that specify 7 for the many biological situations that may arise. 
In population genetics, however, one is usually more interested in the rel- 
ative frequencies qi{t) := yi{t)/ yj{t). Differentiating this and inserting 
(2) leads to 

qi{t) = Qiit) {Ri - {q{t), R)) + {qj{t)Uji - qi{t)Uij), (3) 

jes: 

independently of 7. Here we think of the row vector q = {qi)i^s as a proba- 
bility measure, of the column vector R = {Ri)ies as a function on S (known 
as the fitness function), and of the scalar product {q{t), R) = X^jgg qi{t)Ri 
as the associated expectation, namely, the mean fitness of the population at 
time t. Eq. (3) is the well-known parallel (or decoupled) mutation- selection 
model, which goes back to [6, p. 265]. Although we have derived it here for 
haploid populations (and will adhere to this picture), it is well known, and 
easily verified, that the same equation describes diploids without dominance 
(in an approximation using Hardy- Weinberg proportions). For a comprehen- 
sive review of the model and its properties, see [5, Ch. HI]. 

Rather than considering deterministic and stochastic models separately, 
we aim at a unifying picture and note that the branching process is partic- 
ularly versatile: Its expectation fulfills (1), and the solution of (1), in turn. 
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implies that of (3) (via normalization). Properties of the branching pro- 
cess will, therefore, immediately translate into properties of the mutation- 
selection equation (but not, necessarily, vice versa). For this reason, we will 
consider the branching process as our primary model throughout this paper. 
Let us, therefore, return to branching populations and look at alternatives 
to the parallel model. 

2.1.3 The coupled mutation-reproduction model. In this model one as- 
sumes that mutations occur on the occasion of reproduction events (see 
Fig. 2): An individual again dies at rate Di and gives birth at rate Bi, 
but every time it gives birth, the offspring is possibly mutated (of type j 
with probability Pij), while the parent itself survives unchanged. The cor- 
responding first-moment generator A has elements 

Aij =^ BiPi^ - Dj. (4) 

An example of the coupled model will be studied in Sec. 8. 



Di ■ Bi Pij 



Fig. 2. The coupled mutation-reproduction model. 



2.1.4 General splitting rules. Both the parallel and the coupled mod- 
els are special cases of the general Markov branching model as depicted 
in Fig. 3: An i-individual lives for an exponential time with prescribed 
parameter Ai and then produces a random offspring Ni — {Nij)j(=s with 
distribution on and finite means M{Nij) for all i,j G S. More precisely, 
Nij G Z>o is the number of children of type j, and Piin) — P(A^ij = kj , Vj € 
S). The first-moment generator A has elements Aij = Ai{E{Nij) — Sij). 




Fig. 3. General splitting rules. 



For the coupled and the general branching rules, the first-moment gen- 
erator may again be written in the 'parallel' form A = U + R where C/ is a 
Markov generator and i? is a diagonal matrix; this decomposition is uniquely 
given by Uij = Aij for i ^ j, Uu = - J2j:j^i Uij, and Ri = J2jeS for all 
i e 5. At the time being, this is a formal decomposition, but will receive its 
branching process interpretation later in Sec. 4.2. The corresponding deter- 
ministic models then all take the form (1) and (3), provided the parameters 
are interpreted in the above way. 
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2.2. Fundamental facts 

2.2.1 Forward view and long-time characteristics. We will assume through- 
out that A (or, cquivalently, U) is irreducible. Perron-Frobcnius theory 
then tells us that A has a principal eigenvalue A (namely a real eigen- 
value exceeding the real parts of all other eigenvalues) and associated pos- 
itive left and right eigenvectors tt and h which will be normalized so that 
(tt, 1) = 1 = {n,h), where 1 = (l)igs is the vector with all coordinates 
equal to 1. We will further assume that A > 0, i.e., the branching process is 
supercritical. This implies that the population will, in expectation, grow in 
the long run, as is obvious from (1); in individual realizations, it will survive 
with positive probability, and then grow to infinite size with probability one, 
see (6) below. 

The asymptotic properties of our models forward in time are, to a large 
extent, determined by A,7r, and h, and provide further connections be- 
tween the stochastic and the deterministic pictures. The left eigenvector 
TT holds the stationary composition of the population, in the sense that 
limt^oo Q{t) = TT for the differential equation (3), and, for the branching 
process, 

Z(t) 

lim - — ^-rjr- = IT with probability one, conditionally on survival, (5) 

t^oo ||Z(t)|li 

where ||Z(f)||i := 'Yljes^i^^) total population size. This is due to 

the famous Kesten-Stigum theorem, see [27] for the discrete-time original, 
and [2, Thm. 2, p. 206] and [15, Thm. 2.1] for continuous-time versions. 
Furthermore, 

(7r,i?) = A= lim ilog||y(t)||i= hm - \og\\Z{t)\U (6) 

t-^oo t t^oo t 

is the asymptotic growth rate (or equilibrium mean fitness) of the pop- 
ulation. Here the first equality follows from the identity A = (ttA, 1) = 
{■K,A1) = (tt, /?); the second one is an immediate consequence of (1) and 
Perron-Frobenius theory, and the third is from [15] and holds with proba- 
bility one in the case of survival. Finally, the z-th coordinate hi of the right 
eigenvector h measures the asymptotic mean offspring size of an i individual, 
relative to the total size of the population: 

hi=\\mW{\\Z{t)\\^)e-^K (7) 

For more details concerning this quantitity, see [18] and [15] (for the deter- 
ministic and stochastic pictures, respectively). 

2.2.2 Backward view and ancestral distribution. In the above, we have 
adopted the traditional view on branching processes, which is forward in 
time. It is less customary, h\it equally rewarding, to look at branching popu- 
lations backward in time. To this end, consider picking individuals randomly 
(with equal weight) from the current population and tracing their lines of 
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descent backward in time (see Fig. 4). If we pick an individual at time t 
and ask for the probability that the type of its ancestor is i at an earlier 
time t—T, the answer will be ai = nihi in the limit when first t oo and 
then T ^ 00. Thus the distribution a = (ai)igs describes the population 
average of the ancestral types and is termed the ancestral distribution, see 
[15, Thm. 3.1] for details. Likewise, the time average along ancestral lines 
also converges to a in the long run, see [15, Thm. 3.2]. 




t-T t 



Fig. 4. The backward point of view. The various types are indicated by different 
line styles. The fat lines mark the lines of descent defined by three individuals 
(bullets) picked from the branching population at time t. After coalescence of two 
such lines, the common ancestor receives twice the 'weight', as indicated by the 
extra fat line; this motivates the factor hi in the ancestral distribution. 

If we pick individuals from the population at a very late time (so that its 
composition is given by the stationary vector tt), then the type process in 
the backward direction is the Markov chain with generator G = {Gij)ij^St 
Gij = ■nj{Aji — \6ij)n~^ , as first identified by Jagcrs [22,23]. The corre- 
sponding time-reversed process has generator G = {Gij)ij^s, where 

Gij = Uj Gji qt ^ = hr^ (Ay - A% ) hj ; (8) 

it has been considered in [15], has been termed the retrospective process, 
and may be understood as the forward type process along the ancestral 
lines leading to typical individuals of the present population. By definition, 
G and G both have stationary distribution a. 

3. Preview of results 

In this Section, we will give an informal preview of the results that will be 
obtained in the remainder of the article. This overview will not aim at full 
generality, nor will it dwell on specific technical conditions that are required 
to make things precise. Rather, we will try and motivate the concepts and 
explain the results in the context of the model. The details will be worked 
out in the later Sections. 
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We will work our way from the more general to the more specific. We 
will start with a general variational principle, valid for all model variants of 
the previous Section, irrespective of the type space and of the parameters. 
Next, we will specialize on the case where types are sequences, and mutation 
and reproduction rates are invariant under permutation of sites. This will 
allow to dissect the variational problem into two simpler problems, which 
are easier to solve. Finally, we will treat one specific example, namely, the 
quasispecies model of sequence evolution, in full detail. 

3.1. The general variational principle 

A main object of this paper is to show that the asymptotic growth rate A 

of the population can be understood as the result of a competition between 
the mutation and reproduction processes along a typical ancestral line. In 
this informal Section, however, we will avoid the family tree picture, and 
rather imagine we are observing just one line. To start with, we even ignore 
reproduction, and consider only the simple Markov process {M{t)}t>o on 
S with generator C7; i.e., the type process which associates with t the type 
at time t under the mutation model U . A crucial quantity in what follows 
will be the corresponding empirical measure 

1 /■* 

L{t) := - 6M{r)dT, (9) 

i.e., the random vector with components Li{t) := j /J I{M(r) = i}dT, 
where I{.} denotes the indicator function. This quantity measures the frac- 
tion of time the process spends in the various states, and hence is also known 
as occupation time measure. Clearly, L{t) is a random element of P(S'), the 
set of all probability measures on S. It is well-known by the ergodic theo- 
rem for Markov chains that, for f ^ oo, one has L{t) p with probability 
one, where p is the stationary distribution of U. It is, perhaps, less well- 
known that the rate of convergence may be characterized asymptotically by 
a so-called large deviation principle, which may be informally put as 

P(I,(i) ~ i^) « e-"^^'') for large t, (10) 

that is, the probability that L{t) is close to some measure v decays expo- 
nentially, for large time, with a decay rate (or rate function) Iu{v) which 
can be written down explicitly (see (25) and (30) below). lu is nonnegative, 
and Iu{v) = precisely for v = p, in line with the above fact that, in the 
long run, only the stationary measure p will survive. 

Let us now add reproduction, i.e., turn to the branching process. As a 
consequence of the above large deviation principle, we will obtain, in Thm. 1, 
a link between the forward-time stationary distribution tt of the branching 
process, the reproduction rate R, the asymptotic growth rate A, the muta- 
tion process U and the ancestral distribution a, namely, the equation 



(tt, R)=X = max [{v, R) - Iu{v)\ = {a, R) - Iu{a) . (11) 
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This variational principle may be understood in terms of a competition 
between all possible distributions for a maximal long-term growth rate, as 
given by the difference between the current mean reproduction rate {v, R) , 
and the asymptotic decay rate Iu{v). The first quantity is maximized by 
those measures that put mass only on the fittest type(s); the second one is 
minimized by p\ the tradeoff is won by a. Furthermore, (11) connects the 
forward and the retrospective point of view in that the maximum equals 
the mean fitness (tt, R) of the stationary population. Note that the mean 
fitness of the ancestral population exceeds the mean fitness of the stationary 
one by /[/(a), which is positive unless a = p (which implies Ri = const., 
i.e., there is no selection). This reflects the fact that the present population 
carries with it a tail of (mainly unfavourable) mutants that are present at 
any time, but do not survive in the long run. 

We will see in Sec. 5 that this 'competition of distributions' can be made 
more concrete, namely, in terms of a competition of lines of descent, by con- 
sidering the empirical distributions L'^{t) of types along distinct lines u. But 
before we can embark on this, we must flrst develop a way of constructing 
trees, lines, and processes on lines, in a consistent way; this will be taken 
up in the next Section. 

It is interesting to note that the above variational principle resembles 
the thermodynamic maximum principles in statistical physics. Indeed, our 
reproduction rates may be identified with an energy, and the rate function 
with an entropy; in fact, the rate function for the continuous-time Markov 
chain M{t) can be naturally derived from the usual entropy governing the 
so-called pair-empirical measure of a discrete-time Markov chain, cf. [20, 
Ch. IV]. 

3.2. Sequence space models 

The variational principle (11), valuable as it is conceptually, is not very 

useful if one aims at an explicit solution; this is because maximization is 
over a large space (the set of probability measures on S). However, it turns 
out that, in certain models of sequence evolution, this task boils down to 
a much simpler one if the original problem is dissected into two, one of 
which can be solved explicitly. Let us first describe this 'divide and conquer' 
strategy. 

Assume that the type of an individual is characterized by a sequence of 
nucleotides, amino acids, matches/mismatches with respect to a reference 
sequence of nucleotides, or, in general, letters from some alphabet S. Thus 
E = {A, G, C, T}, {1, . . . , 20}, {0, 1}, or any other finite sct^. The natural 
type space is then , the set of possible sequences of length N, where 



^ As in the case of matches/mismatches, the formal alphabet S need not co- 
incide with the alphabet used in the biological description. The letters in the 
original biological sequence may, for example, even be replaced by ri-tuples of 
matches/mismatches relative to n reference sequences, £is required in the treat- 
ment of Hopfield fitness functions [3, 13] 
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N is typically large. However, if the mutation and reproduction rates are 
invariant under permutations of sequence sites, all relevant information on 
a sequence a = {(Jk)i<k<N G S'^ is already contained in its letter histogram 
(or letter composition) 

N 

H{a) = {m{a))iei: , He{a) = Y,H(^k = i}, (12) 

which indicates how often each letter £ shows up in a. In other words, it is 
sufficient to look at the reduced type space 

S = H{S^) = |z e Z'^ I > Ofor^ e Z", J2^^ = (^^) 
with d= \S\, which consists of all possible letter compositions. This lumping 

Fig. 5. Lumping a sequence space. 



procedure induces a model on S that is again a Markov branching process; 
its reproduction rates R = {Ri)i^s and mutation generator U = {Uij)ijes 
arc uniquely determined by the corresponding rates of the original process 
on . Many models of sequence evolution allow for such a lumped repre- 
sentation; as a particularly realistic example, let us mention the mutation- 
selection model for regulatory DNA motifs [16], which also involves analysis 
of sequence data. 

To get back to the variational problem, we will classify the possible 
distributions v G ^{S) according to the value of their mean (j^, id) G W^; 
here id denotes the identity function on S defined by idi = i for all i £ S, 
and, in line with previous usage, the scalar product gives the expectation 
of this vector-valued function under the measure v. Keeping in mind that 
S arises from lumping a sequence space as in Fig. 5, we think of {v, id) 
as the expected value of a random letter composition with distribution u, 
i.e., the mean histogram if histograms have distribution v. 

Let us now foliate the variational problem (11) according to these mean 
letter frequencies. That is, we write 



A= max A{z), (14) 

z6conv S 



with A : conv /S — > M given by 



Mz)--= ,,max [{u,R)-Iu{u)]. (15) 



Here we write conv S for the convex hull of S, that is, the set of convex 
combinations of elements of S, or, in other words, the set of all possible 
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mean letter compositions. The (unique) maximizer of vl is 5 := (a, id), i.e., 

the mean ancestral letter composition. 

The function A{z) describes the growth rate resulting from the com- 
petition between all distributions with mean letter composition z; we will 
therefore call it the constrained mean fitness of 2;. In analogy with the in- 
terpretation of the unconstrained variational principle (11), the competing 
distributions may be identified with empirical letter compositions along lines 
of descent, and A{z) will turn out as the asymptotic growth rate of the lines 
with empirical letter histogram (close to) z; this will be shown in Prop. 2. 
It follows that the growth rate of the total population coincides with the 
growth rate A{z) of the subpopulation consisting of all lines with empirical 
letter histogram close to the mean ancestral one. 

Now, the main point is that A{z) can be calculated explicitly in two 
interesting situations, namely: 

(1) All sites of the sequence mutate independently and according to the same 
Markov process in continuous time, and fitness is additive across sites. Thus 
i?i = R{i) and Utj = Uj-i{i) are linear functions on S (that will be extended 
to convS). In Thm. 2, we then obtain A{z) explicitly, and exactly, as 

A{z) = R{z) -\Y. = {^^'-\R) - Iu{^^% (16) 

fe 

where the sum is over all possible mutational steps, and z/^^^ = Mult]v,z/]v 
is the multinomial distribution with mean z. 

(2) The reproduction and mutation rates have a continuous approximation 
of the form 

with functions r and Uk that are smooth enough. Under further technical 
conditions, an analogue of (16) will be obtained in Thm. 4, namely, 

yl(z) = e(z) + 0(7V-i/3), (18) 

where 

e{z) = r{z) -W {VM^) - V^^)f. (19) 
^ fe 

Strictly speaking, the approximation (18) is only true when e{z) is concave; 
otherwise e{z) has to be replaced by its concave envelope, and the distri- 
bution attaining the constrained maximum A[z) will show distinct peaks. 
This behaviour, which indicates some kind of phase transition, will be the 
subject of Thm. 5. For z = z, this phenomenon means that the total growth 
rate A is determined by two or more coexisting subpopulations with distinct 
empirical letter histograms. 
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3.3. The quasispecies model 

We will finally consider the coupled sequence space model on {0, 1}^, known 
as the quasispecies model; more precisely, wc will use a slightly adapted 
version of the original in [9]. It will be assumed that births and deaths occur 
at rates that are invariant under permutation of sites, and mutations occur 
on the occasion of birth events, independent across sites, and at probabilities 
V = n/N and w = v/N from to 1 and vice versa, where /U and v are 
positive and independent of N. Then, lumping may be performed into S := 
{0, 1, . . . , N} by counting the number of I's in a sequence. If the birth and 
death rates of the resulting model on S have a continuous approximation 
analogous to that of (17), namely, 

then 

e(z) := biz) cxp [ - ( V/x(l - z) - - d{z) (20) 

takes the role of e{z) in (19). 

4. Trees, lines, and processes on lines 

To understand the probabilistic significance of the variational principle pre- 
viewed above, it is necessary to develop a detailed picture of the branching 
process that includes the full family tree. However, to keep technicalities 
at a minimum we confine ourselves, in the first subsection, to the parallel 
model; in this case, a particularly simple construction is available which 
is sufficient for our needs. A more versatile procedure for general splitting 
rules will be sketched in Subsection 4.2. 

4.1. The parallel model 

Let us explain the construction for the parallel model, as illustrated in 
Fig. 6. The population is started by a single individual (the root) of type i. 
In a first step, we ignore all death events and consider only the splitting 
events. Then all lines are infinite and can be labeled by a sequence lu G 
{0, 1}^^! =: i7, where w„ tells us whether the n-th offspring corresponds to 
the upper (0) or lower (1) branch in the graphical representation of the tree, 
or, equivalently. whether it is counted as 'first' or 'second' at birth. Next, 
individuals are defined as (finite) initial segments of the infinite lines, i.e., 
X = (wi, . . . ,uin) is an n-th generation individual. The empty initial string 
of length corresponds to the root and is counted as generation 0. The set 
X := {0} U (Un>i{0' -'-}") ^^^^ comprises all individuals that may possibly 
occur (and do occur as long as death events are ignored). 

A realization of the Markov branching process described informally in 
Sec. 2 may then be specified by associating with every line u! the times at 
which it splits, its type (as a function of time), and the time it dies (by 
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a death event). For convenience, the construction proceeds in two steps: 
we first grow a tree by splitting and mutation alone (with the appropriate 
exponential waiting times); the death events are then superimposed in a 
second step to determine which lines are still alive. This way, lines that 
have already died live on virtually and may continue to divide and mutate. 
However, this does not influence the lines that are alive; only these constitute 
the realization of the branching process. In particular, we denote by X{t) G 
X the set of individuals alive at time t; note that this is a mixture of various 
generations. (We remain a bit informal here; for one of the various possible 
ways of a rigorous construction, see [15].) 

(0,0) 

(0,1,0) 

(0,1,1) 
(1,0) 




(1,1,0) 



(1,1,1) 



t 

Fig. 6. The branching process with mutation and binary splitting. Bullets mark 
death events; line segments that are alive are shown in black, virtual ones in grey, 
types are indicated by various line styles. The (randomly chosen) representative 
line is marked fat; its initial segment shown here is the first child of the second 
child of the first child of the root, i.e., the individual x — (0, 1,0). Since it has 
experienced three splitting events, it is a third generation individual; but it is 
virtual, as in fact already its mother (0, 1) died. The 'black' tree is a realization of 
the branching process. The individuals alive at time t constitute the population 
X(t) (a mixture of various generations); here, X{t) — {(0, 0), (1, 1, 0), (1, 1, 1)}; 
the other individuals at time t are virtual. 

For each line ui, we consider now the following families of random vari- 
ables: {M'^(i)}t>o, the type process, which associates with t the type of 
u! at time t; {/?"(i)}t>o, the number of birth events along ui before t; and 
T", the time line lu dies (if lu survives forever, this time is infinite). Both 
the birth and the death process depend on the type process, but not vice 
versa. The crucial information on {M'^ (t)}t>o is contained in its empirical 
measure 

L^{t) :=- [ SM^ir)dT, (21) 
Jo 
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cf. (9). For an individual x at time t, the empirical measure only depends 
on the initial segment of to that describes x. With this in mind, we will 
sometimes also write L^{t) rather than L"{t). 

The above families of random variables are not independent between 
lines (they arc dependent through common ancestry), but, by symmetry 
between the two offspring at every splitting event, they share the same 
m,arginal laws for all a; S J7. In particular, since mutation is not influenced 
by the reproduction events, the type process on any given line (regardless of 
the others) is a copy of the mutation process generated by U . Let us choose 
one particular such line cj*, for example, by setting u>* = (000. . .), or by 
tossing a coin. The line lo* may or may not survive, but it will always be 
present at least virtually. We will call it the representative line for reasons 
to become clear in a moment, and set j3{t) := (3'^ {t), M{t) := M'^ (t), 
T(t) := T"'(t), and L{t) := L'^' (t). We wiU now see that, once we know 
the laws of these quantities, they can tell us a lot about the entire tree. 

The basic observation is that, in generation n, there are 2" possible 
(real or virtual) individuals, all with the same marginal laws for the random 
variables just discussed. This allows us to express the expected population 
size of a population started by a single i individual at time as follows: 

r(||z(t)||i) =r(|x(t)|) = ^2"r(i{/3(t) = n,T>0) 

n>Q (22) 

= E'(2^Wl{T > t}) . 

Now, conditionally on L{t), the random variables {T > t} and (3{t) are inde- 
pendent, having probability exp{—t{L{t), D)) resp. the Poisson distribution 
Po'k{L{t),B) with parameter t{L{t),B). Therefore, 

E(2''W I L{t)) = exp{t{L{t),B)), and 
E(I{T > t} I L{t)) = exp{-t{L{t),D)) 

(both independently of the type of the root), where the former relies on the 
fact that, for a random variable Y with distribution PoIa, one has E{2^) = 
e^. Therefore, (22) turns into 

r(||z(t)||i) =r(E(2^(*)i{T>o I i(t))) 

= E^ (e(2^(*) I L{t)) E(I{T > t} I L(t))) (23) 
= E^(e*<'^(*)'-^>e"*<^(*^'-°>) = E*(e*<-^(*^'-^>). 

Note that the remaining expectation (and the outer one where expectations 

are nested) is with respect to L{t). We also remark that the imder lying tree 
construction lurks behind the above derivation, but in the simple case at 
hand it need not be made more explicit. 
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4-2. General splitting rules 

We have, so far, restricted ourselves to the decoupled model with parallel 

mutation, reproduction and death. The crucial simplifiying feature here is 
the fact that, forward in time on every line, we have a copy of the mu- 
tation process generated by U. Therefore, we could consider any line as 
representative. 

Outside the parallel model, the decomposition A = U + R is formal 
to start with, and the generator U has no immediate interpretation. But 
with the help of a more advanced tree construction, one can again obtain a 
representative line with its type process M{t) generated by U. We will only 
give a rough sketch here; for the full picture we refer the reader to [15]. 

The construction relics on a so-called size-biased tree with random, spine 
(or trunk). The general concept was introduced in [21,30] and [28]; the par- 
ticular (continuous-time) version required here can be found in [15, Remark 
4.2]. Informally, one constructs a modified tree with a randomly selected, 
distinguished line (called the trunk or spine), along which time runs at a dif- 
ferent rate and offspring are weighted according to their size; in particular, 
there is always at least one offspring along the trunk so that the trunk sur- 
vives forever. The children off the trunk get ordinary (unbiased) descendant 
trees; see Fig. 7. 




random choice 



Fig. 7. A realization of a size-biased tree with its trunk (the fat line). An in- 
dividual of type j, off the trunk, has offspring Nj with distribution after an 
exponential waiting time Tj with mean 1/Aj; an individual of type i along the 
trunk bears offspring Ni with biased distribution p^ after an exponential waiting 
time fi with mean l/j4iE(||iVi||i). 



More precisely, for each type i e 5, we introduce the size-biased offspring 
distribution 

^^^''^ - E(||iV,||i) ' ''^^^o- 

Starting at the root, an individual of type i on the trunk waits for an 
exponential time with parameter AjE(|| A^jjli) and then produces offspring 
Ni according to p^; one of these offspring is chosen randomly (with equal 
weight) as the successor on the trunk. It is easily verified that the type 
process on the trunk is a Markov chain generated by U. The trunk takes 
the role of the representative line, and the considerations of the previous 
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Subsection carry over. We do not spell this out here explicitly; for the com- 
plete picture and many details, in particular on how the trunk may be used 
to extract further information about the tree, see [15]. To avoid misunder- 
standings, we would like to emphasize that the size-biased tree as applied 
to the parallel model does not reduce to the simple special construction of 
the previous Subsection. In particular, unlike the representative line of this 
construction, the trunk of the size-biased tree is certain to survive forever. 
However, both constructions share the essential property that the mutation 
process along the trunk or representative line, respectively, is generated by 
U, and the fact that many properties of the entire tree may be extracted 
from this distinguished line. 



5. Variational characterization of the asymptotic growth rate 

We are now in a position to derive the variational characterization (11) of 
the asymptotic growth rate A. The idea is to observe both the mutation 
process and the reproduction rate along the representative line of the tree. 
The appropriate tool for analyzing the tradeoff between these processes is 
the large deviation principle for the mutation process. 

5.1. Using the large deviation principle 

Let us, for the moment, restrict ourselves to the parallel model; we will see 
later that our results hold automatically for general splitting rules. For the 
parallel model, we can combine (6) and (23) to obtain 

A = hm i logr (e*<^(*)'«>) = lim ^ logE' ( exp [ J* i?M(.) dr] ) , (24) 

t—>oo t t—>oo t \ J 

that is, the growth rate can be determined by observing the types and the 
associated reproduction rates along the representative line. The competition 
between reproduction and mutation will lead to a variational formula for 
A, which can immediately be derived from the variational formulas of large 
deviation theory. The basic fact is the following large deviation principle for 
L{t), see [20, Ch. III.l and IV.4] or [7, Ch. 1.2 and 3.1]). 

Proposition 1. The empirical measure L{t) of a continuous-time Markov 
chain on a finite state space S with irreducible generator U satisfies the 
large deviation principle (LDP) with rate function 

I^{iy):=sup\-U—)], peP{S), (25) 



■y>0 



V 



where the supremum is taken over all v G Kfo? '^'^'^ fraction is to 

be understood component-wise, i.e., Uv/v is the vector with components 
(Uv)i/vi. More explicitly, the LDP means that 

limsup-logP(i(i) e C) < - inf Iu{iy) 
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for any closed set C C P(S'), and 

liminf ilogP(L(i) S O) > - inf 

t— >(X) t v&O 

for any open set O C P(>5'). Furthermore, Ijj is continuous, strictly con- 
vex and nonnegative, and Iu{v) = precisely for v = p, the stationary 
distribution ofU. 

For an informal statement of the LDP recall (10). (Although we have stated 
the LDP here only for the special case we need, it is indeed quite a general 
principle that applies to many common types of random variables. We refer 
the interested reader to the monographs [7] or [20].) 

Returning to (24), we now see that, on the right-hand side, the exponen- 
tial factor e*^^(*^'^^ is integrated over a probability measure that behaves 
essentially like e~*^" . It may thus be evaluated by Varadhan's lemma on the 
asymptotics of exponential integrals, which is a far-reaching generalization 
of Laplace's method; sec [20, Thm. IILIS] or [7, Thm. 4.3.1]. Specifically, 
we obtain the key formula 

A= lim - log / e*<'''^>P*(L(t) e dv) = max \{v,R) - luiv)], (26) 
t^oo t Jp(s) i/eP(S) 

which may be understood as a 'largest exponent wins' principle. Let us 
continue with a series of comments. 

5.1.1 Relation to the retrospective process. The maximum principle (26), 
though derived by considering the branching process forward in time, is di- 
rectly connected to the retrospective process of (8). In analogy with (25), 
the rate function for the empirical measure of the retrospective process (gen- 
erated by G of (8)) reads Ig{v) — sup^>o[~(^i (^'"^)/^)]- Tiiis, however, 
is closely related to Iu(v). Indeed, setting v ^ {vi)iizs with Vi = hiWi we 
can write {v, {Gw)/w) = Y^i.jes ^ii^ij - >^Sij)hjWj /hiWi = J2ij^s ^ii^ij - 
XSij)vj/vi = {v, R) — \ + {v, {Uv)/v), whence 

Icily) = sup[-(i/, {Gw)/w}] ==X-{iy,R) + sup[-{u, {Uv)/v)] 

w>o v>o ^27) 

= X- {u,R)+Iuiiy). 

Again, Ici^y) is nonnegative, strictly convex, and vanishes if and only ii u = 
a, the stationary distribution of G. It follows that the ancestral distribution 
a is the unique maximizer in (26). We may thus summarize our findings in 
the following theorem (recall (6) for the first identity). 

Theorem 1. The forward-time stationary distribution n, the reproduction 
rate R, the asymptotic growth rate A, the m,utation process U and the an- 
cestral distribution a are linked via the equation 



(tt, R) =X= max [(i/, R) - luM] = (a, R) - Iu{a) . (28) 
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5.1.2 The mutation rate function at the ancestral distribution. Thm. 1 
yields the additional relation 

Iu{a) = {a,R) - X = {it, Rh - Ah) = {it, -Uh) = ^ ntUijihi - hj), 

ijes: 

i.e., the value of the mutational rate function at the optimum equals the 
long-term loss of offspring due to mutation, wherefore it was previously 
termed mutational loss function; see [18, Sec. 5 and Appendix A] for the 
biological implications. 

5.1.3 Balance of mutation and reproduction. On every line uj, the mu- 
tation process runs randomly through a sequence of histories, and hence 
determines an evolution of empirical measures L'^{t) G P{S). As t ^ oo, 
the empirical measures v = L'^{t) that differ from the stationary distribu- 
tion p of U become exponentially less probable at asymptotic rate Iu{i^)- 
In particular, p is the (almost-sure) long-term time average on the line lu 
in the forward direction of time. In spite of this, the long-term population 
average w of (5) differs from p, in general. This is because mutation is coun- 
terbalanced by reproduction, at rate RM'^{t) at instant t, and at mean rate 
(L"(t), R) for the entire line segment up to time t. We note that in realistic 
biological models the largest reproduction rates typically belong to types 
that are improbable under the stationary mutation distribution p ('good' 
types are rare under mutation alone, otherwise it would not require selection 
to establish them!). Hence, empirical measures with a large mean reproduc- 
tion rate tend to differ markedly from p. The resulting tradeoff between the 
mean reproduction rate of a line and its asymptotic rate of decay is won by 
those lines uj for which L'^(t) = v maximizes the difference, {v, R) — lui^)- 
According to Thm. 1, these are precisely the lines having the ancestral dis- 
tribution a as their time average. It is therefore this a that is successful in 
the long run and that we see when looking back into the past. 

5.1.4 Extension to general splitting rules. In our proof of Thm. 1 above, 
we Mscd a probabilistic argument that relied on the parallel model and the 
associated tree construction. So it might seem that this theorem is limited to 
this particular model. Note, however, that all quantities appearing in Thm. 1 
arc solely determined by the first-moment generator A of the process, so that 
it is a property of A rather than the underlying process. For an arbitrary 
Markov branching process, we can use the formal decomposition A = U+R 
of its first-moment generator to build a parallel model with the same A. 
Since the theorem holds for the latter process, it also holds for the former; 
this is some kind of "invariance principle" . All that is lost is the probabilistic 
interpretation given in the previous comment; such an interpretation may 
be regained with the help of the size-biased tree construction of Subsec. 4.2, 
but is then more involved. 
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5.2. Reversible mutation rates, and symmetrization 

We will now discuss the important special case that U is reversible, in that 
piUij = PjUji for all i,j € S. This is assumed in most models of nucleotide 
evolution, see, e.g., [12, Ch. 13]. The interest in this case comes from the 
following facts. 

5.2.1 Explicit form of the rate function. For reversible U, the maximiza- 
tion in (25) can be carried out explicitly, so that the rate function takes the 
closed form [20, p. 50, Ex. IV.24] 

here both the square root and the fraction are to be read componentwise, 
and (m, v) p denotes the Dirichlet form UiVipi for vectors u, v, and p. (It 
is an interesting fact that no such simplification exists for reversible Markov 
chains in discrete time.) Noting that pi > for all i Cz S by irrcducibility, 
using the reversibility in the form \/PilPj Utj = ^JUijUjl, and recalling that 
Uii = — Uij, one readily finds that Eq. (29) is equivalent to 

i,jeS: ijtj 

5.2.2 Estimation of the reproduction rate from the ancestral distribution. 
The reversibility of U immediately implies that the vector ph := {pihi)i^s 
is a left eigenvector oi A = U + R iov the principal eigenvalue A, cf. [3]. 
Hence tt = p/i up to a normalization factor, and therefore a = ph"^, or 
h = yjca/ p. again up to a normalization factor. (As before, the square root 
and the fraction are to be read componentwise.) This in turn means that 
a, together with p, determines the reproduction rate R up to an additive 
constant. Indeed, suppose that R and R' are two reproduction rates (for the 
same mutation matrix U) having the same ancestral distribution a = a' . 
Then h = h' , whence {R — R')h = (A — X')h. As h is strictly positive, it 
follows that all components of R — R' agree. 

5.2.3 Syrnmetrized mutation rates. For reversible U , one can introduce 
the matrix A := {Aij)i^j^s by Aij = ^/piAijj^Jpj, which is symmetric 
and has the same spectrum a& A = TJ -\- R. The maximum principle of 
Thm. 1 can therefore also be derived from the Rayleigh-Ritz (or Courant- 
Fisher) variational principle for the leading eigenvalue of A; see [3, Sec. 2]. 
We emphasize, however, that the large deviation approach to (26) is not 
tied to reversible matrices and, as we have shown above, admits a natural 
interpretation in terms of the underlying family tree. Nevertheless, we will 
take advantage of the symmetrization A in Sect. 7 below. In particular, we 
will use the (unique) decomposition A = F E into a symmetric Markov 
generator F = {Fij)ij^s, defined through 



^. . ^ I ^/U^3UJ^ = Fj, for i ^ j, 

1-Efees\{i} VC^ifet^fei fori=j. 
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and a diagonal matrix E := diag{Ei \ i G S) with elements^ 



Ei := Aij = Ri + Uii — Fa = Ri + \/UijUji ■ 



(32) 




6. Unfolding the variational principle 

As wc have seen, the maximum principle of Thm. 1 provides some general 
insight into the competition, and resulting tradeoff, between mutation and 
reproduction. In general, however, it can not be solved explicitly. This is be- 
cause both the maximization over the space P(S'), and the eigenvalue equa- 
tions determining tt, h and thus a, are jSI-dimensional, and S is typically 
large. It is thus natural to ask whether one can obtain a low- dimensional 
variational principle in a specific setting. In the rest of this paper we will 
therefore confine ourselves to genetic models of sequence type where each 
type is specified by a sequence of letters from a finite alphabet. The varia- 
tional problem can then be split into two simpler ones, a constrained vari- 
ational principle with fixed mean letter composition, and a maximization 
over all possible constraints. In some cases, each of these two subproblems 
may be treated explicitly or, at least, approximately. 

6.1. Lumping of sequence types, or: Choice of a type space 

As previewed in Subsection 3.2, we will now assume that the type of an 
individual is characterized by a sequence of letters from some finite alphabet 
S, which leads to the type space . If we assume that the mutation and 
reproduction rates are invariant under permutations of sequence sites, as we 
will do in what follows, this sequence space can be lumped into the smaller 
space 



recall Fig. 5. For example, this is possible for sequence space models with 
parallel mutation and reproduction, in which 

(LI) all sites mutate independently and according to the same (Markov) pro- 
cess (a natural first assumption made in many models of sequence evo- 
lution) and 

(L2) the fitness function is invariant under permutation of sites (a less natural, 
but still common assumption that applies, for example, if fitness only 
depends on the sequence through the number of mutated positions (i.e., 
the Hamming distance) relative to a reference sequence, often termed 
the 'wildtype'); 




(33) 



^ The corresponding equation in [3, Sec. 2], namely, the second-lEist equation on 
p. 88, is erroneous and should be corrected accordingly. 
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sec, e.g., [18,14] or [16] for previous work on this case. (As an alternative 
to the choice (33), one can use the constraint X^^g^i^ = N to remove an 
element a £ S hy setting E* = IJ\ {a} and work instead with 

5* = {i € Z'' 1 > for £ G i;*, ^H<N^, (34) 

where now d = = - 1.) 

Specifically, if the reproduction and mutation rates on are given by 
T^-o- and U„r (c, r G S^), then, by permutation invariance, there is a vector 
R = (i?i)igs and a Markov generator U = {Uij)ij^s so that TZ^ = RH{a) 
and X^T-ff (r)=j ^<TT = UH{a),j for all a G S^; here H is as in (12). R and 
U then define a Markovian branching process with type space S.^ 

In fact, assumption (LI) even implies that the mutation rates Uij of 
the Imnpcd model are linear in i G S* (or affine in i G S*). This is seen as 
follows: If Wim is the mutation rate (at every site) from letter £ to letter 
m, then the corresponding transition in the lumped model (based on E) 
is i ^ i — ei + Cm (where ej is the the unit vector in R"^ having a 1 at 
coordinate j), and occur at rates iiwimj due to independence of the sites. 
If, instead, one removes one dimension by setting — N — J2ieS' *^ ^^'^ 
then works with E* , one obtains the additional transitions -i ^ i — at 
rate wgaie, and i ^ i + at the (affine) rate Wami^ ~ J2ees* *^)' 

Assumption (L2) is less specific than (LI); the fitness function in the 
lumped model will, in general, be nonlinear due to interactions between 
sites. It will, however, turn linear (or affine) if fitness contributions are 
additive across sites, as is usually assumed in, e.g., models of codon bias 
(where E is the set of possible codons). Additivity reflects independent 
fitness contributions of the sites and means that, for i G S, one has Ri = 
Y^tesTiH (based on E), or Ri = VaN + J^ees'i^i ~ ^a)H (if is used), 
where € M for ^ e E. We will examine such linear models in Subsec. 6.3. 

6.2. Fixing the empirical mean 

The only property of the special choices (33) or (34) of the type space S we 
need at the moment is that S (ZW^. This provides 5* with the structure of 
an abelian group (elements of S can be added and subtracted), and allows 
us to classify the possible empirical distributions u G P(S') according to 
the value of their mean (j^, id) G W''. In particular, for the random measure 
L^{t) of (21), (L'^(t),id) is a random vector in W^, namely the empirical 
mean, or empirical mean letter composition along the line oj up to time t. If 
S is obtained through lumping a sequence space E^ as in Fig. 5, the ^'th 



^ For a general description of lumping in Markov chains see [26, Ch. 6]; and 
for an extension to the present (branching) context with specific appfications 
to genetics, see [3, Sec. 5 and 6]. In the present case, lumping is so immediate 
that it hardly needs to be formalized. But the procedure becomes nontrivial if, 
for example, fitness functions are derived from Hopfield energy functions (see [3, 
Sec. 6] and [13]). 
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coordinate of L'^{t) indicates the total fraction of time up to t for which 
some site in the sequence characterizing an individual on the hne w shows 
letter £ G E. Note that this involves a twofold averaging, namely an average 
over time and a (non-normalized) average over sequence sites. 

As indicated in (14) and (15), we will now foliate the variational formula 
(26) by prescribing the mean of the underlying type distribution. That is, 
we write 

A = max A{z), (35) 

zGconv S 

where 

^(^)^= p.cTf^..^ K^'^)-^c^W] (36) 

is the constrained mean fitness of z € conv S. As before, the maxima are 
attained by continuity, and the maximizer in (36) is unique by the strict 
convexity of lu- The function A is strictly concave; this follows again from 
the strict convexity of lu . together with the linearity of ( • , i?) and ( • , id) . 
In particular, A is continuous on 

rint conv S = {{u,id) \ v G P(S'), i^i > for all i G S} , 

the relative interior of conv S [32, p. 82]. In general, the relative interior 
rint D of a set D C R"^ is defined as the interior of D relative to the smallest 
affine subspace containing D."* Moreover, since a is the unique maximizer 
in (28), there exists a unique z G conv 5 that maximizes A, namely 

z = {a,id), (37) 

i.e., the unique maximizer z in (35) is the ancestral type average. 

If U is reversible, we may restrict the maximization in (35) to those z 
that are strict convex combinations of the elements of S. This is obvious 
from the explicit form of lu in (30): If at least one component of v vanishes, 
one has {d / dvi)Iu{y) = +oo for some i. Therefore, the maximum will be 
located in rint conv S, so that Eq. (35) can be replaced by 

A= max A{z). (38) 

zGrint conv S 

If the function A{z) were known explicitly, the variational problem of 
Thm. 1 would boil down to a maximization over a subset of W^; for small d 
one could aim at explicit solutions. Such low-dimensional variational prin- 
ciples for A were recently derived for several examples, by methods from 
linear algebra and asymptotic analysis [3,13,14,18]. However, a plausible 
understanding for the resulting function to be maximized has been lacking 
so far. The next Proposition reveals the probabilistic meaning of A{z): It is 
nothing but the asymptotic growth rate of the lines having empirical type 
average (close to) z. Together with (37), this shows that the growth rate of 



* Recall that the simplex (33) is contained in a hyperplane, so that the usual 
interior of its convex hull is empty. 
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the total population coincides with the growth rate A(z) of the subpopu- 
lation consisting of all individuals with empirical type average close to the 
ancestral one. 

Proposition 2. For all z e rintconvS, the solution of the constrained vari- 
ational problem (36) satisfies 

yl(z) = lim hm ilogE'( ^ l{ ||(L-(t), id) - < e}) . 

£— >0t— >00 t \ ' 

xex{t) 

Proof. Consider first the parallel model. By the reasoning leading to (23), 

the growth rate of the subpopulation consisting of all individuals with em- 
pirical mean close to z, up to some maximal deviation £ > 0, is equal to 

^limilogE^( I{ll(i"(i),id)-^||i<4) 

xex(t) 

= }}^ 7 E 2" =n,T>t, \\{L{t), id) - z 111 < e}) 

n>0 

= lim ilogrfe*<^W'-^>l{||(L(i),id) -z||i <e|) ^^^^ 

t— >(X> t \ / 

= ^lim ilogE*(exp [t ((L(t),i?) - oo • I{\\{m,id) - z\\i > s})]) 
= max 1(1/,^) — luM] = max Mv)- 

,y&P(S):\\{i^,id)-z\\i<e^ yGconv S: <e 

Here we have used the conventions oo • 1 = oo and oo • = in the third step, 
and Varadhan's lemma in the fourth, in analogy with (26); the maximum 
over v is attained since the condition ||(i^, id) — 2:||i < s defines a com- 
pact subset of P{S). As A is continuous on rintconvS", the last expression 
converges to A{z) as e — > 0, as asserted. 

For a general splitting rule, the argument is the same except that the 
particular tree construction of Subsec. 4.1 has to be replaced by the size- 
biased tree described in Subsec. 4.2. In fact, one simply has to omit the 
second line of (39) above and instead invoke Eq. (4.4) of [15] which shows 
that the first line of (39) coincides with the third; the random measure L{t) 
in the third line is then again the empirical measure of a Markov chain with 
generator U, namely the mutation process along the spine of the size-biased 
tree. □ 

Like the unconstrained variational problem (26) leading to A, the con- 
strained problem (36) defining A provides insight into the mutation-repro- 
duction process, but does not, in general, lead to an explicit solution if S 
is large. ^From the point of view of explicit calculations, it rather expresses 
one difficult problem (the leading eigenvalue of a large matrix) in terms 
of another difficult problem (the maximization over a large space). But if 
U is reversible, there are two cases in which (36) may be solved explicitly 
or, at least, asymptotically. These are the cases when fitness and mutation 
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are linear (already hinted at in Sec. 6.1), or when they allow a continuous 
approximation in the limit as the number A'' of sequence sites grows large. 
These will be discussed in the next Subsection and in Section 7. 

6.3. Exact results for linear reversible models 

In this Subsection we have a closer look at the sequence space models of 
Sec. 6.1 that describe the independent evolution of N sites with a finite 
alphabet S and lead, after lumping, to models with state space S as in 
(33), with linear fitness and mutation, and mutational transitions i ^ i + k 
restricted to those with k G & := {e^ — e^ \ m,£ G IJ,m ^ £}. In line 
with standard assumptions on sequence evolution (see, e.g., [12, Ch. 14]), 
we posit that the mutation process acting at the sites is reversible, that is, 
the mutation rates {wem)e,mes define an irreducible and reversible Markov 
generator with a reversible distribution 7 on E. After lumping, the associ- 
ated mutation process on S then has rates Uij = Uj-i{i) , i,j G S, given 
by 

wcmZe i{k = em-ee€&, 

Uk{z) = {- E^^m "^imZi for fc = 0, 
_ otherwise 

for z € R'^, d = \S\. The reversibility of (w^m)^,mex' readily implies that 
the mutation generator U = {Uij)ij^s is also reversible; its reversible dis- 
tribution is p := Multjv,7 , the multinomial distribution for A'' samples from 
the distribution 7 on S. As motivated in Sec. 6.1, we will also assume here 
that the reproduction rates are linear, in that Ri = R{i) for all i G S, for 
a linear function R of the form R{z) = r ■ z, r, z E M."^. Here and below we 
write '•' for the scalar product of vectors in R'', in contrast to (.,.), which 
we have reserved for scalar products of vectors in . In this setting, the 
constrained variational problem (36) admits an explicit solution as follows. 
Due to (38), we may - and will - restrict ourselves to considering means in 

rintconv 5 = |^ e M'' | > for S, ^ = Af | . 

Theorem 2. In the situation described above, for every z £ rintconv 5 the 
restrained maximum of (36) is given by 

A{z) = R{z) - i ^ (x/C4W - Vu^y = - Iu{v^'^). (40) 

fees 

where u^^^ = Multjv,2/Ar is the multinomial distribution with mean z. 

Proof. Let z G rint conv5' be given, and consider any j/ S P{S) with {ly, id) = 
z. It is then clear that (t^, R) = R{z) by linearity. Let us rewrite Eqn. (30) 
in the form Iu{i^) = \ Y.kee ll^fe ~ V^Wh where Xk = ixk,i)ies and yk = 
{yk,i)ies are the vectors with components 

Xk,i = VUk{i)'^i, yk,i = VU-k{i + k)yi+k , 
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i G S, k G &. In the boundary case when i G S but i + k ^ S, wc have 
Uk{i) = by definition, and hkewise U-k{i + k) = when i ^ S but 
i + k G S. Hence Xk,i = yk,i = unless i,i + k G S. By hnearity of the 
Uk, it follows that \\xk\\l = Uk{z) and ||yfe||| = U-k{z) for all fc e 6. As 
the distance between any two vectors is minimized when the vectors are 
parallel, we conclude further that 

\\xk - VkWl > {Wxkh - WiJkhY 

with equality if and only if there is a positive constant Ck so that 
Uk{i)'^i = CkU-k(i + k)vi+k 

whenever i,i + k G S. This, however, is the case when v = iv(^) because, 
for each i G S, vf^ — e^^"'pi for f3 = \og{z/N^) (where the fraction and the 
logarithm are taken componentwise), and p is reversible; in fact we have 
Ck = e~^'^. Combining the preceding observations we get the result. □ 

If wc turn from the linear model to the affinc one, by removing one 
coordinate as indicated at the close of Sec. 6.1, Theorem 2 clearly remains 
true, with the middle expression in (40) expressed in terms of the reduced 
coordinates. Eq. (40) has been derived previously for certain specific choices 
for the mutation rates [14,18]; remarkably, the above result provides both 
an extension (to arbitrary reversible models), and a simplification of the 
proof. 

6.4- Partial convex conjugation 

On our way to the second case of an explicit version of A{z), we need a 
general intermediate step: a relation between A{z) and the mean growth rate 
A for a suitably modified reproduction rate R. This relationship is based on 
partial convex conjugation, a standard procedure of convex analysis which 
will be spelt out here for our purposes. In Sec. 7, this will allow us to 
determine the asymptotic behaviour of yl(2:) when the number N of sequence 
sites gets large. 

Let us rewrite equation (26) in the form 

^^^^^ J^^s) 

indicating the dependence on R; U will be considered as fixed. The following 
proposition asserts that the function z — » —A{z) of constrained extrema is 
a partial convex conjugate of the function R \{R). 

Proposition 3. Let S C , z G rintconvS', and U he an irreducible 
Markov generator on S (not necessarily reversible). Then the constrained 
variational problem (36) has the solution 

A{z)= inf \\iR + l3-\d)- 13-z] 

= \{R + • id) - • .2 = (Q!^ R) - Iu{a'') ■ 
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Here, (3^ G Mf^ is the negative slope vector of any tangent plane to A at z, 
and is the unique ancestral distribution corresponding to the reproduction 
rate R + (3^ -id for any such (3^ . In particular, the function (3 ^ A(i? + /3 • id) 
is differentiable on R"*, and X{R + /3 • id) | ^=/3^ = {a^, id) = z. 

Proof. For any z G rintconvS' and /3 € R"^ we have, writing for the 
maximizer in (36) and using Thm. 1, 

A{z) = (^/^ R + (3-id)-(3-z- Iu{v^) < \{R + 13 -id) - (3 ■ z . 

Taking the infimum over (3 we arrive at the inequahty 

A{z)< lu^ jX{R + 13 -id)- 13- z\. (41) 

To show equaUty we recall that A is strictly concave and finite on a (relative) 
neigbourhood of z and therefore admits a tangent plane at z. That is, there 
exists some (3 such that 

A{y) < A{z) - j3 ■ {y - z) for all y € conv S , 

with strict inequality for y ^ z. Denoting by the ancestral distribution 
for the reproduction rate i? + /3 • id and letting y ~ {a^ , id) we find 

X{R + P -id) - 13 ■ z = {a^,R) + p ■ (y - z) - luia^) 
< A{y)+p-{y-z) < A{z). 

Together with (41) it follows that equality holds everywhere in (42). Hence 
y = z, (41) holds with equality, and the infimum is attained for any j3 
determining a tangent to A at z. In general, there may be several such 
tangents, e.g., if S is contained in a hyperplane of R'^. However, the associ- 
ated ancestral distribution is uniquely determined. For, suppose there exist 
/3i 7^ (32 both determining a tangent to A at z, and let and be the 
ancestral distributions for the reproduction rates i? + /3i • id and R + /32 ■ id, 
respectively. The preceding argument then holds for every f3 in the segment 
[Pi, P2], whence (42) holds with equality everywhere for all these /3. We can 
thus conclude that the function /3 \{R + P ■ id) is affine on [Pi,P2\- In 
particular, using Thm. 1 and the shorthand fi{v) = {v, R + Pi ■ id) — Iu{i^) 
we find 

111 1 
max [-fi{v) + -f2{v)\ = - max/i(z/) + - max/2(i^). 

Since f\ and /2 are strictly concave, this is only possible if they have the 
same maximizer. That is, = c? . Finally, using the equality in (41) and 
the convex duality lemma [7, Lemma 4.5.8] we find that the function P — > 
A(i? -I- /3 • id) is the convex conjugate of the strictly convex function — A, and 
thus differentiable; see [32, Thm. 26.3, p. 253]. Its gradient at necessarily 
coincides with z. □ 
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In the case of a reversible niutation matrix U, the preceding proposition 
can be complemented as follows. We write T = span(S' — S) C M.'^ for the 
linear space generated by the set of differences of elements of S. 

Corollary 1. For reversible U the following additional statements hold, 
(a) The function A defined in (36) is differentiable^ on rintconvS', and its 
conjugate function (3 X{R + /3 • id) is strictly convex on T. Moreover, 
for z e rintconvS' and f3 G T we have (3 = —VA{z) if and only if z = 
\7f,X{R + 13- id). 

(h) The function A on rint conv S remains unchanged under symmetrization, 
i.e., by replacing U with the matrix F of (31), and R with the function E 
defined in (32). 

Proof, (a) Let z e rintconvS* and /3,/3' be two negative slope vectors of A 
at z. In view of the uniqueness of and the remarks in 5.2.2, the scalar 
product (/? — 13') ■ i is then independent of i € S. This means that (3 — j3' 
is orthogonal to T, so that there is a unique negative slope vector (3^ G T. 
By concavity, the uniqueness of the tangent plane is equivalent to differen- 
tiability; cf. [32, Thm. 25.1, p. 242]. By the proof of Prop. 3, this is also 
equivalent to strict convexity of A(i? + /3 • id) on T. The final statement 
comes from the observation that both assertions are equivalent to the iden- 
tity \{R + (3- id) - A{z) ^ 13-z. 

(b) For each [3 € R'^, the matrix F + diag(-Ei + ■ i \ i £ S) is similar to 
U + diag(i?i + 13 ■ i \ i £ S), so that their principal eigenvalues agree. The 
result thus follows from Prop. 3 by minimization over f3. □ 

7. Smooth approximations 

While still adhering to a lumped sequence model, we will now turn to a situ- 
ation complementary to that of Thm. 2: we consider nonlinear reproduction 
and mutation rates that allow for a continuous approximation if the number 
of sequence sites becomes large; this approximation is only required locally, 
which provides much more freedom, and, in particular, removes constraints 
imposed by the boundary (recall the boundary conditions Uk{i) = for 
i€S,i + k^Sin Thm. 2). For a large family of models with reversible U , 
an asymptotic low-dimensional maximum principle for A is available then 
[3], but no connection to the constrained mean fitness (36) has been made 
there, and the ensuing probabilistic interpretation was still lacking. On the 
basis of Prop. 3, this can now be provided. 

In view of Corollary 1(b), the case of a reversible mutation matrix U can 
be reduced to the case of a symmetric mutation matrix F. That is, instead 
of the first moment generator A = U -\- H we can and will consider the 
symmetrized version A = E + F defined in (31) and (32). In Subsection 7.1 



^ If T is a proper subspacc of , differentiability means that the directional 
derivatives in the directions of T exist, and the gradient is the unique element of 
T determined by these directional derivatives; its component orthogonal to T is 
thus set equal to zero. 
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we will present a slight refinement of an asymptotic maximum principle 
derived in [3, Thm. 1]. In Subsection 7.2 we will derive an approximation 
of A{z) in two particularly interesting situations. An application to the 
quasispecies model follows in Sec. 8. 

7.1. Approximation of the asymptotic growth rate A 

Consider the following setup. For each N let 

- S ^ S{N) C Z'' be a state space as in (33) or (34). 

The rescaled set ;^S' is then contained in a simplex D c K'', viz. either 
D = conv {ci, . . . , Cd} or D = conv {0, ei, . . . , Cd}, with ei, . . . , Cd the unit 
vectors of Z''. (In the first case, D is contained in a hyperplane, whence in 
the following we will always consider the relative interior of D rather than 
simply its interior.) In the limit as N ^ oo, j^S becomes dense in D. For 
each N let also 

— F be a symmetric Markov generator on S, and E := diag(£'i \ i £ S) a, 
diagonal matrix. 

We assume that F and E admit a continuous approximation as follows: 

There exist real functions e and fk on D, and an "approximation domain" 
A C rint D such that the following conditions hold. 

(Al) e is on A and, as — > oo, 

where the 0(1/N) terms are uniform for all i,j € S with i/N,j/N S A. 
(A2) Uniformly for all i with i/N € A, 

keS-i 

for some constant C and all 1 < £, m < d, where S — i := {j — i : j G S}. 
(A3) For suitable constants C, C" < oo we have 

-C <Ei< sup e{z) + o(^) and Fij < C" 

for all i,jGS,i^ j, with a uniform error term 0{1/N). 

Theorem 3. Suppose the conditions (A1)-(A3) hold for a relatively open 
neighbourhood k of a global jnaximizer z* e rintD of e. Then the principal 
eigenvalue A of the matrix A = E + F admits the approximation 

A = e(.*)+0(1). 

The error term here only depends on the constants in (A1)-(A3) and the 
Hessian of e at z* ( via an upper bound on the modulus of its most negative 
eigenvalue). 
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We postpone the proof until Subsection 7.3, discussing first the significance 
of the assumptions and the result. 

7.1.1 Formal comments. The above approximation for the principal eigen- 
value of A (;lcarly also holds for the similar matrix A — R + U . Note also 
that only the function e remains relevant in the limit; the fk play no role. 
This means that A = E + F provides the 'right' decomposition into the 'rel- 
evant' £!-term, and an F-ievm whose contribution to the leading eigenvalue 
vanishes in the limit. 

It is also interesting to observe that the approximation assumption (Al) 
is only required in a neigbourhood A of a single maximizcr z* G rint D of e; 
further maxima may appear, even on the boundary, but these do not matter. 
This locality of the approximation domain is the main difference to Thm. 1 
of [3] which requires a globally uniform approximation. (As the example 
of linear mutation in Thm. 2 shows, it often happens that the derivatives 
diverge at the relative boundary of convS, so that a global approximation 
is not feasible. This is also the case for the quasispecies model considered 
in Sec. 8.) As a global requirement we need only the bounds in (A3). 

7.1.2 Significance of the assumptions for the model. Our setup implies 
that replacing iGShyi/NGS/N will yield a continuous type variable 
z G D in the limit. Accordingly, the matrix elements are required to become 
smooth functions of z as A'' ^ oo - at least locally, in line with (Al). 

Condition (A2) says that the mutation rates must decay fast enough 
with distance to the target type - again, at least locally. This assumption 
may appear to be rather special at first sight, but actually it is very natural: 
As we have seen in Sec. 6.1, independent mutation at the sites of a sequence 
leads to nearest-neighbour mutation on S, hence (A2) is trivially fulfilled. 
For the corresponding quasispecies model (to be described below) , still with 
independent mutation at the sites, the decay of fk with k is exponential, 
rather than only cubic as required in (A2); this will be shown in Sec. 8. 

In many concrete examples, the reproduction and mutation rates have 
their own continuous approximations each, i.e., 

with C^(D,R) functions r and Uk. Moreover, the range of all mutational 
steps is finite (on S, and independently of N); that is, there is a finite 
symmetric (i.e., S = — S) set S C Z"^ with the property that, for all N, 
Uij = whenever j — i ^ &. Then (Al) is automatically satisfied for any A 
on which y/uk{z) is for all fc e 6; inspecting the matrix elements of E 
in (32) and noting that J2kee "fc(^) ~ finds that 

e{z) = r{z) -\Y. (^M^ - V^^)f- (43) 

kee 

It is interesting to observe that the expression above is formally identical 
with A{z) of Thm. 2, although we are considering quite a different situation 
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here. Special cases of (43) have appeared in [3,18] in the context of parallel 
sequence space models, and the resulting maximum principle turned out as 
a key to determine the mutation load, genetic variance, and the existence 
of error thresholds. 

7.1.3 Locality of the ancestral distribution. Under the additional (but 
generic) assumptions that the function e admits a unique maximizer z* e 
rint D and the Hessian of e at z* , restricted to T = span(D — D), is (strictly) 
negative definite, one can also characterize the ancestral distribution, which 
is connected to A through the general variational principle of Thm. 1. 
Namely, by Thm. 2 of [3], this distribution is concentrated in a neighbour- 
hood of z* whose width decreases with 1/y/N. More precisely: For every 
< £ < 1, there is a constant c > 0, independent of N, so that, for N large 
enough. 



By Cor. 3 of [3], it follows that (a, id) = z* + 0{N-'^/^), i.e., the ancestral 
type average coincides with the unique maximizer of e up to a small error 
term. The constant in the error term here depends on those in the assump- 
tions and some bounds separating the spectrum of the Hessian (restricted 
to T) of e at z* from — oo and 0. The proofs given in [3] are solely based on 
a local approximation and thus remain valid under our weaker assumptions. 

7.2. Approximations of the constrained mean fitness A 

Our next goal is an approximation for the partial maximum A{z) of (36). 
In fact, the similarity of the expression (43) for e{z) and the expression for 
A{z) in Thm. 2 leads one to ask whether the asymptotic identity of the 
global maxima of A and e, as asserted by Theorem 3, can be extended to 
an asymptotic relation between these functions as a whole. On the basis of 
Prop. 3 such an approximation can indeed be given. We consider first the 
most salient points of e, i.e., the points where e coincides with its concave 
envelope. Let us say z G rint D is an exposed smoothness point of e if 

- e{y) < tz{y) := e{z) + Ve(z) • {y — z) for all y ^ z, i.e., z is the unique 
point where e hits its tangent plane tz &i z. 

- e is on a neighbourhood of z, and the Hessian of e at as a bilinear 
form on T = span(D — D), is negative definite. 

If e is strictly concave on D, the first condition is trivially satisfied. If e is 
also C^, the second condition just covers the generic case of strict concavity. 
In other words, for a generic strictly concave C^-function e, every z G rint D 
is an exposed smoothness point. 

To state the hypotheses of the next theorem we recall that the assump- 
tions (Al) and (A2) only involve an approximation on a local set A, while 
(A3) imposes some global bounds, including an upper bound on Ei in terms 
of supe. We now replace the constant supe by suitable tangent planes of e, 
thereby turning (A3) into the hypothesis 




(44) 



iGS: 
\i/N-z*\>c/^/N 
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(A3') For all 2; e A and suitable constants C , C" < 00 we have 
-C"<£;, <i,(|p)+o(^) and F,, < C" 
for all i, j € S , i j, with a uniform error term 0{1/N). 

Theorem 4. Consider a relatively open convex subset A of rintD consisting 
of exposed smoothness points of e and satisfying the hypotheses (Al), (A2), 
and (A3'). Then one has the approximation 

A{z) = e{z) + 0{N-^/^) 

locally uniformly for all z G A. The constants in the error term only depend 
on the error terms in the assumptions, some locally uniform upper bounds 
on |Ve|, and the Hessian of e (via some locally uniform bounds separating 
its spectrum from —00 and 0). 

The proofs of this and the subsequent theorem follow in the next subsection. 

Theorem 4 raises the question of what happens if e touches a tangent 
plane at two or more distinct points of int D. Let z be a strict convex combi- 
nation of these points, and f3 the negative slope of this plane. The ancestral 
distribution a for the reproduction rate E = E + f]-id/N is then expected to 
split into distinct peaks located at the competing maximum points of the as- 
sociated e; its mean type (a, id) will remain close to z, but the reproduction 
rate A{z) will be the corresponding convex combination of the values of e 
at the maximum points of e. So it may be conjectured that, in general, A{z) 
is approximated by the concave envelope of e at z. The next theorem shows 
that this is indeed the case. We note that this kind of behaviour is related 
to the phenomenon of error thresholds and phase transitions described in 
detail in [18]. 

For a given function e on D we let 

e{z) := inf |a - /3 • z aeR,f3eR^,a-f3-y> e{y) Vy € d| , z&D, 

be the concave envelope of e. (For an example see Fig. 8.) We consider 
the situation when e deviates from strict concavity, so that e is afHne on a 
nontrivial set B. Let us say that B is a basin of e if B has nonempty relative 
interior and 

B = {zeD\ e{z) = a-l3-z} (45) 

for suitable a e M, /3 e R"^. Note that a basin B is necessarily convex and 

compact. We write ex B for the set of its extremal points. Let us say that 
a basin B of e is determined by smooth hills of e if there exists a relatively 
open neigbourhood H of ex B in D such that 

- H \ B consists of exposed points of e, and 

- e is on H, and its Hessian (restricted to T) is negative definite with 
a spectrum which is bounded from below and bounded away from zero 
uniformly on H. 
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This is the situation one typically encounters when a smooth e deviates 
from strict concavity. The theorem below provides an approximation of the 
restrained maximum A{z) defined in (36). 

Theorem 5. Consider a basin B c rintD of e that is determined by smooth 
hills H of e, and suppose the assumptions (Al), (A2), and (A3' ) are satisfied 
with A = (H \ B) U ex B. Then we have the approximation 

A{z) = e{z) + 0{N-^/'^) 

uniformly for all z & B. 



7.5. Proofs 

We now turn to the proofs of the three Theorems of this section. 

Proof ( of Thm. 3). The proof of Thm. 1 of [3] goes through with the changes 
summarized below; we will refer to equations in the previous paper by double 
brackets ((.)). Throughout, notation changes from x to z, E{x) to e{z), and 
a to a; mi D is replaced by rint D throughout. The upper bound on A 
remains unchanged in view of ((12)) and (A3). For the lower bound, let z* 
be given as required, and place the test function v = {vi)i^s of ((32)) at this 
z*. The argument after ((40)) changes as follows. Due to (Al), 30 < 6 < e 
and < 7 < DO so that, for \z - z*\ < 5, e{z) > e{z*) - ^\z - z*\'^. Then 
one has 



ieS leS: ieS: 

\i/N-z'\<S \i/N-z*\^5 

>(e(.*) + 0(l))(l + 0(e-«^^^))-7 E 



ieS: 
\i/N-z'\<5 



I * 
IN 



>e(.-) + <r)(i). 



ieS: 
\i/N-z'\^S 



In the second step, we have used (Al), normalization (X^i — 1)) a^^d 
((39)) (which also holds for k = 0, cf. Lemma 2 and Cor. 2 of [3]); the last 
step relies on ((39)), ((40)), and (A3). 

In the proof of Prop. 4 of the original article, starting from the second 
display (p. 97), we split the sum into 



i,jeS ieS: keS-i: 

\i/N-z*\<S ri{i,k)>0 

- E E Pi,i+k{Vi - Vi+kf . 

ieS: keS-i: 
\i/N-z*\>5 ri{i,k)>0 



(46) 
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We now note that the display in the middle of p. 97 implies that, for \i/N — 
z* \ > 5, one has — Vi+k < caNe^"-^^ ri{i, k), where a and c are constants, 
and ri{i,k) = 0{1) (by the first display on p. 97). The elements of F are 
asymptotically bounded (by (A3)), so the second sum in (46) is 0{Ne-''^^ ) 
and plays no role at the 0{1/N) level in the remaining calculation. 

Let us finally collect the quantities that influence the error term in the 
result. These are: the constants in the approximation of E and F in (Al); 
the constant in the decay condition on / in (A2), sec Eq. ((45)); the con- 
stants in the global bounds on E and F in (A3), as used in this version 
of the proof; and the Hessian of e at z* (it enters the constant 7). This 
completes the proof. □ 

Proof (of Thm. 4)- Pick any exposed smoothness point z <E A, and let 
/3 = — Ve(z). Consider the function e{y) := e{y) + (3 ■ y, y e rintD. By 
hypothesis, e has the unique maximizer z. Assumptions (Al) (A3) thus hold 
for the modified reproduction rates Ei := Ei+f3-i/N and the approximating 
function e. (Note that the error terms do not depend on 0.) Theorem 3 
then implies that X{E) = e{z) + 0{l/N). Next we apply Prop. 3 to the type 
set S/N to infer that \{E) = A{z) + /3 • z for the vector z = {a,id/N) = 
Vjs X{E+P-id/N), where a is the ancestral distribution for the reproduction 
rate E. (Alternatively, one can invoke Cor. 1(a) to characterize z by the 
equation V/l(iVz) = Ve(z).) The comments in paragraph 7.1.3 above assert 
that z = z + 0{N-^/^). Hence 

A{z) = e{z) -(3-Z + 0{1IN) = e{z) + 0{N-^'^) . 

By the assertion on the error terms in Thm. 3 and in paragraph 7.1.3, the 
error term here is locally uniform in z. 

Next we note that the (^''-dependent) mapping (p : z ^ z from A into D 
is a homcomorphism. For. (j) is the composition of — Ve and (3 V/3 A(i? + 
(3 ■ id/N). Now, Ve is a diffeomorphism from A into T = span(D — D) 
because, by assumption, the Hessian of e (restricted to T) is nondegenerate 
everywhere on the convex set A. so that \7e{x) = Ve(j/) only if x = y 
by the mean value theorem. On the other hand. Corollary 1(a) shows that 
V/3 X{E + (3 ■ id/N), as a function of /3 e T, has the inverse y -VA{Ny); 
these gradients are continuous by Corollary 25.5.1 of [32]. 

Now let C C A be compact and C C A, say, a convex polytope containing 
C in its relative interior, (j) moves the faces of C by at most a distance of 
TV- 1/3, for some constant n < 00. Hence 0(C') D C for large A^. For these 
N we can invert on C to get (j)~^{y) = y + 0{N~^/^) uniformly for all 
y G C. Since |Ve| is bounded on C, it follows that 

Aiy) = e{rHy)) + 0{N-'/^) = e{y) + 0{N-^'^) 

uniformly for all y e C. □ 

Proof (of Thm. 5). Take any z e B. By a well-known theorem of Caratheo- 
dory (Thms. 17.1 and 18.5 of [32]), ^ is a convex combination of at most d+1 
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extremal points, that is, there exist points zi, . . . e exB and numbers 
si, . . . ,se > summing to 1 such that £ < d+1 and 

e 

Z = ^ Sk Zk . 

fc=i 

Next we fix some e > 0. By hypothesis, for each k = I, . . . ,i we can find 
a point Uk € H \ B and a relatively open convex neighbourhood of jjk 
such that \yk — Zk\ < £ and consists of exposed smoothness points of e. 
Theorem 4 thus asserts that A{yk) = e{yk) + 0{N^^/'''). In view of the 
assunicid uniform bounds on the spectrum of the Hessians, the error term 
here is independent of k and the choice of yk- Letting £ ^ 0, we thus can 
conclude that A{zk) = e{zk) + 0{N~^^^), and therefore by concavity 

e I 

A{z) >Y,Sk A{zk) = Y,Sk e{zk) + 0{N-^/^) = e{z) + 0{N-^/^) . 
fe=i fe=i 

On the other hand, since the upper estimate on Ei in (A3') also holds for 
z = Zk, assumptions (A1)-(A3) hold with Ei = Ei+(3-i/N and e := e+/3-id 
in place of Ei and e, respectively; here /3 is as in (45). Prop. 3 and Thm. 3 
therefore imply that, for each k, 

A{z) +p-z< X{E) = e{zk) + 0{1/N) , 

where X{E) stands for the principal eigenvalue of the matrix E + F with 
reproduction rate E. Taking the average over k we find 

e 

A{z) - 0{l/N) <-(3.z + Y,Sk e{zk) = e{z). 

fe=i 

The proof is therefore complete. □ 

8. Application to the quasispecies model 

8.1. The model and its large-N asymptotics 

We will now illustrate and apply the results of the preceding Section to the 
coupled counterpart of the parallel sequence space model of Subsec. 6.1. 
The coupled sequence space model, known as the quasispecies model, was 
introduced in [9] and has, since then, been the subject of numerous inves- 
tigations. It assumes that mutations occur on the occasion of reproduction 
events, that is, they represent replication errors. Let us assume that muta- 
tion is, again, independent across sites and occurs at probabilities v = fi/N 
and w = v/N from to 1 and vice versa, where fi and v are positive and 
independent of A''. This is a slight generalization of the original model [9] 
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with symmetric mutation, and the factor in the mutation rate is in- 
troduced to obtain a suitable Hmit^. The matrix of mutation probabilities, 
V = {'PaT)a,Tes", is then given by 

where the tensor product reflects the independence across sites. The quasi- 
species model is complete if we further specify birth rates and death 
rates X'g. for all a E E. When a birth event occurs to a cr individual, it 
survives unchanged and produces an offspring of type r with probability 
Vctt', at a death event, a a individual dies (as in Fig. 2 with i,j replaced by 
(t,t). 

We will assume that, for all a G S, Ba- and Vcr are invariant under 
permutation of sites. Since the same holds, by construction, for the mutation 
probabilities (47), wc have a situation analogous to (LI) and (L2) for the 
parallel model, and may perform lumping into S := {0, 1, . . . ,N} by the 
mapping a i— > H{a) G S, where H{a) is the number of sites occupied by 
letter 1 (sec Sec. 6.1). The resulting model on S has birth rates Bj, death 
rates Dj, and mutation probabilities Pij, where B^ = Bi, = Di, and 

E 'P-r=Pij (48) 

T:H{T)=j 

for any cr with H{a) = i. 

In the lumped model, given the current type i, the distribution of jumps 
is obviously given by the convolution 

Pj,j+, = Binjv-i,„ * Binj_.u,, (49) 

where Bin„_p denotes the binomial distribution with parameters n,p > 0, 
and Bin„_p its image under the reflection of Z at the origin; we further 
identify Bino,p with the point measure located at 0. Explicitly, 

P«H.-.r-.a-.,. E (-•)(:)(^)'(^)".,o, 

e,m>0: ^ / \ / 

C—m=j—i 

The Markov chain so defined is reversible with respect to ip = {(fi)ies = 
Binjv,^/(«+uj); this is most easily seen by noting that V (on sequence space) is 
reversible with respect to the Bernoulli measure on {0, 1}^ with parameter 
v/{v + w). 

The lumped Markov branching process has first-moment generator A 
with elements Aij = BiPij — Di (cf. Eq. (4)), and has been much studied, 



The factor may come somewhat unexpected, but means nothing but a change 
of time scale, which will not alter the long-term asymptotics. For a thorough 
discussion of the related scaling issues, see [4]; in the language of that article, we 
use intensive scaling here. 
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see [10] for a review of early work, and [24] for a review of recent theo- 
retical developments, and their connection to experimental results on virus 
evolution. In particular, the error thresholds displayed by this model have 
attracted a lot of attention. 

The function e{z) that would simpliiy the model's analysis does not seem 
to have appeared so far; it is far less obvious than its parallel counterpart 
(43), and will be established in what follows. We start by decomposing A of 
(4) into a Markov generator U and a diagonal matrix R, which gives Uij = 
BiPij for i 7^ j, Uu = -Bi{l - Pu), and Ri = Y.jes ^ Bi - A- Since 
P = {Pij)ij^s is reversible, U is also reversible: its reversible distribution 
p is given by pi = cipi/Bi for a normalizing constant c > 0. The elements of 
the symmetrized matrices E and F of (32) and (31) therefore emerge as 

Fij = ^BiP,,Pj,B, foii^j, (51) 
Fu = - J2 ^^i' (52) 

and 

Ei = -Di+J2 VBiPijPjiBj. (53) 

jes 

After these preparations, let us identify conditions under which Thms. 3, 
4, and 5 are applicable. We will consider the approximation of the birth and 
death rates as given; we will then show that the Poisson approximation to 
the distribution Pi^i+,, namely p,{i/N) — Poi^(jv-i)/jv * Poi^i/N> will also 
lead to the 'right' approximation to the matrix elements (51)-(53). In line 
with previous notation, Poi^ is the Poisson distribution with parameter 
A > 0, PoIa its reflected version, and Poio is identified with the point mea- 
sure at 0. This will give us the following result. 

Theorem 6. Consider the lumped quasispecies model, with first-moment 
generator A of (4) on S = S{N) := {0, 1, . . . , iV}; birth rates Bi > 0, 
death rates Di > 0, and mutation probabilities Pij as in (50). Assume that 

where b and d are functions on D := [0, 1], b is strictly positive, and the 
constants in the 0{1/N) bounds are uniform for all i G S. For z £ D let 

g{z) := {^/J41^) - (54) 

and 

e(z) :^6(z)e-s(^) _d(z). (55) 
Assume further that e" has only finitely many zeroes. It then follows that 

A = maxe(2) + o(-i:) and A{z) = e{z) + 0{N-'^^^) 
locally uniformly for z G ]0, 1[. 
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Postponing the proof for a moment, let us first look at an example. 

8.1.1 An example. For the purpose of illustration, let us consider the qua- 
sispecies model with a 'smoothed' version of truncation selection (where a 
gene tolerates a certain number of mutations and then deteriorates rapidly). 
Let the birth and death rate functions be given by 

b{z) := d{z) := where r{z) := e-^^^^* (56) 

(i.e., we assume a mixture of fecundity and viability selection). Fig. 8 shows 
the fitness function, and the function e together with its concave envelope. 




0.5 z ^ .5 g 1 



Fig. 8. The quasispocics example (56), with 7 = 5., and /j, = 1, v = 0.3. Left: 
The fitness function, r = b — d. Right: The function e (solid line) and its concave 
envelope e (dashed), where it deviates from e. 

8.1.2 Connection to the parallel model. The quasispecies model is closely 
related to the lumped parallel sequence space model with birth rates Bi, 
death rates Di, and mutation rates Ui^i+i = — i/N), Ui^i-i = ui/N 
and Uij = for |j — i| > 1 (where /x and v are now mutation rates per 
site rather than probabilities). In fact, the latter may be considered as the 
former's weak-selection weak-mutation limit (cf. [5, Ch. 11.1.2], and [19]). It 
leads to the simpler expression 

e{z) = b{z)^d{z)-g{z), (57) 

cf. (43), and [3,18]. Indeed, this function is easily identified as the weak- 
selection weak-mutation limit of (55) by replacing b by 1 + 5h, d by 1 + 5d, 
fi by fxS, V by ij8. and e by e/S; the last replacement means that time is 
measured in units of 6. e(z) of (57) then emerges from (55) in the limit 
^ ^ 0. 

8.2. Proof 

The proof of Thm. 6 consists in verifying the assumptions of Thms. 3, 4 
and 5 for the matrices E and F in (51)-(53). The main difficulty will be 
to establish the 0{1/N) approximation as required in (Al). Besides the 
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approximating function e in (55) for E, the approximating functions fk for 

F will be given by 

fo{z):=b{z){po{z)-e-s^'^), fk{z) ■.= h{z)^pk{z)p-k{z) (0 ^ G Z), 

(58) 

where 

Pfe(z) := (Poi^(i_,) * pSi.^)(fc), fc e Z. (59) 

These functions are quite natural, as they are obtained by replacing the 
binomial distributions at hand by their Poisson approximations. Neverthe- 
less, the required approximation result is not at all automatic: Although 
Bin^ (jv-i)/jv and Biuj^ j/]v deviate from Poi^(jv_i)/]v and Poij^j/]v, respec- 
tively, by 0{1/N) in variational distance [29, Section II. 5], and this carries 
over to the convolution, it remains to be shown that the corresponding 
symmetrized quantities share this property. The key to this task is the fact 
that the Poisson distributions are particularly well-suited for a geometric 
symmetrization as in (51). This is the content of the following lemma. 

Lemma 1. For a J) > 0, let p,{a,b) = {pk{a,b))kez ■= Poia * Poih be 
the convolution of the parameter-a Poisson distribution with the reflected 
parameter-b Poisson distribution. Then 

^Pk{a,b)p-k{a,b) = e-(v^-^)' Pk{Vab, Vab) 

for all k £ Z. 

Proof. Since pk{a, 0) = for fc < 0, andpfe(0, 6) = for A: > 0, the conclusion 
is immediate if either a or 6 vanishes. For a,b> 0, the explicit formula 

p.(«,6) = e-- E m 

e,m>0: 
t—va—k 

readily implies that p^k{o., b) — Pk{a, b){b/a)'', whence 

^Pk{a, b)p^k{a, b) = pk{a, b){b/af'^ (61) 

for all k £ Z. Inserting (60) into the last term and comparing the result 
with the similar expression for pk{Vab,Vab) we obtain the conclusion of 
the lemma. □ 

We will be particularly interested in the Poisson approximation to the 
right-hand side of (49), viz. 

Pk{z) •.= pk{lJ.{'i-- z),uz), (62) 

where z := i/N. Lemma 1 then implies that 

J2VPk{z)p-k{z) = e-^^^\ (63) 

fcGZ 

thereby explaining the origin of the function g defined in (54). We will also 
need the following tail estimate. 
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Lemma 2. For all a,b>0, 

2\/ab 



^ sj Pk{a,b)p_k{a,h) < 



N 

feeZ: |fe|>Viv 

Proof. If a = or 6 = 0, Pk{a, b)p-k{a, b) = except for A; = 0, so that the 
assertion is trivial. For a, 6 > we can write, using Lemma 1 and Markov's 
inequality: 

N ^ ^ypk{a,b)p-k{a,b) < ^k'^pk{Vab, \/ab) . 
feez:|fe|>\/jv fcez 

By symmetry, the last sum is the variance of Poi^^ * P^ivob ^^'^ equal 
to 2y/ab. □ 

We need a similar tail estimate for the geometric symmetrization of the 
matrix P defined in (50). Note that P depends on N. 

Lemma 3. For all i G 5, 

jeS: \j-i\>VN 

for a constant C depending on /x and u but not on N. 
Proof. We use the obvious inequality 



Bin„,, W = (1 - pr Q < e^-P Poiaii) (64) 

which holds whenever 0<i<n, 0<p<l, and a > np/{l — p). This 
implies that Pi,i+fc < e''+^pfc(a, 6) with a := ^/{l — v), b := 1^/(1 — w), 
uniformly in i Q S. Hence 



^P^,^+kP^+k.^ < c"+'' , (65) 

and the result follows from Lemma 2. □ 

The crucial step is the following Poisson approximation of the geometric 
symmetrization of P. 

Proposition 4. With the abbreviation Pk{z) ■= Pfe(/u(l — z), vz), we have 

E 1^^^- ^Po-i{ilN)Pi-AilN)\ = O(^) (66) 
jes 

uniformly in i as long as i/N is bounded away from and 1. 
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Proof. Consider an arbitrary i e S and suppose z := i/N is bounded away 
from and 1. (We will generally suppress the i-dependence of all abbre- 
viations to be introduced below.) The main difficulty of the proof comes 
from the fact that the parameters of the probabilities Pji depend on the 
variable j rather than i. Fortunately, P is reversible, and Lemmas 2 and 3 
allow us to confine ourselves to the j's with \ j — i\ < VN. We proceed by a 
comparison of upper and lower bounds on 



rk := \J Pi,i+kPi+ks , keS-i. 

Step 1: A lower estimate. Since P is reversible w.r.t. ^m.M,v/{v+w)i we 
have 



' (i + ky.{N - i - k)\ 



i\{N-iy. 

for all fc e 5 — i. Since also m!/n! > n™"" for all m, n, it follows that 

r-fc > Lk := Pi^i+k s'' (67) 

for all fc, where s := {iw/{N - i)v)^^'^ = {zi^/{l - z)fiy/^. 

We now take the sum over fc. Using (50) and the binomial formula we 
can write 

rk = {l-v + vs)'^-^{l-w + w/sy 

keS-i 

/x2(l-s)2(l-z)2x 



> exp(-M(l-s)(l-^))(l- 



N-i 



(68) 



xexp(-.(l-l/.).)(l- '^ (^-y^^ ^ ) 
= e-(^)+0(l). 
The inequality follows from the fact that 

for any n> 1 and \x\ < n; sec [31, 3.6.2., p. 266]. In the last step we used 
that z is bounded away from and 1. 

Step 2. An upper hound. Arguing as for (67) we find 

^ ''-^A iN-i-k)v T = ""-'^^ (^(^/^))' ' 

where (a;) := (l-|-a;/z)-^/^(l — a;/(l — z))~-^/^. Next, for each fc an expansion 
of tp gives 

(^(fc/TV))' = 1 + ^ (V(^fe))'"V'(^fe) 
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for some between and k/N . As long as z is bounded away from 
and 1 and < V^, the V- and ^'-expressions on the right-hand side are 
bounded from above, so that 

(^(fc/7V))'=<l + ^ 
for some c < oo. Also, using (64) we find 

Pi,i+k < e^+^-'pfcla, h) with a := 1^1^, b := . (70) 

1 — V 1 — w 

Collecting all estimates we arrive at the upper bound 

Tk < fk := e^'-^'"'' Pk{a, b)s'' (l + ^) . (71) 
Next, a summation over k gives 

E r-.<e™+«"'($:p.(a,6)s'= + ^) 

|fc|<Vlv fcez 

with K :— Ylkez, [cs)^ Pk{o-: b). To deal with the terms on the right-hand 
side we note first that pk{a,b) s'^ = e°''^~^''^^~°'~^pk{as,b/s) by formula (60). 
Hence 



^^Pfc(a, 6) s'' = exp(as + b/s — a — b) = exp ( — g{z)) + O 
fcez 



because a w (1 — z)ii and 6 « 21/ up to error terms of order Likewise, 

K = exp (acs + b/ (cs) — — 6) ^ fc^ (acs, b/ (cs)) 

fcez 

is bounded in N since so are a, b. Since also va + wfe = (^(l/A^), we finally 
arrive at the estimate 

E r-,<e-«(^)+0(l). (72) 

|fc|<\/]V 



Sfep 5; Conclusion. Consider now := yjpk{z)p-k{z)- By (61), (jfe = 
Pk{z)s^ . It is also immediate that Pk{z) < e""'^"'^ pk{a,b) with a, 6 as in 
(70). Hence qk < ffe for all k. Combining this with Lemmas 2 and 3 and the 
bounds (67) and (71) we find 

X] \'^k-qk\= X! (2max(rfe,gfe) - rfe - gfe) 

fces-i keS-i 

|fe|<\/jv kes-i \k\<VN 

Now, (68) and (72) show that the first sum exceeds the second only by a 

term of order 1/N, and (72) together with (63) and Lemma 2 imply that the 
first sum exceeds the third by at most a term of order 1/7V. This completes 
the proof. □ 
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Besides the preceding key approximation in the interior of D, we also 
need a uniform bound which will be used close to the boundary of D. Here, 
an error bound of order 1/VN is sufficient. 

Proposition 5. For all i G S, 

where the 0{1/^/N) term depends on ji and v but not on i. 

Proof. In view of Lemma 3, we only need to estimate the sum over all 
j = i + k G S with |A;| < y/N. For these j we find, using the Poisson 
bound (64) and writing again z = i/N, 

\/Pi,i+kPi+k,i < e"^^ VPk{a,b)p-k{a,b). 

Here a = - z + 1/Vn)/{1 -v),b = v{z + l/v^)/(l - w), and 

a = a - -z- l/ViV) = 2^/Vlv + C'(l/7V), 
h = b-v{z- 1/Vn) = 2vIVn + 0{l/N) , 

so that e°+'' = 1 + 0{\/\fN); the error terms do not depend on i. The 
claim thus follows from Lemmas 3, 2, 1 and the fact that {\/a — \f\)f' = 
g{z) + 0{l/sfN). □ 

After these preparations we are now ready to read off the approximat- 
ing functions e and fk for the lumped quasispecies model, that is, we can 

proceed to the 

Proof (of Thm. 6). The main point of the proof is to establish condition 
(Al) for any compact interval A c ]0, 1[. The (asymptotic) boundedness of 
the -Bj's and Lemma 3 imply that, for each i G S, 

■\/BiBi-\.kPi,i+kPi+k,i = + 5^ V BiPi+kPi,i+kPi+k,i ■ 

keS-i \k\<\^N 

(73) 

The asymptotics of the Bj's implies further that the sum on the right-hand 
side is equal to 

(l + 0(l))fe(z) exp {I3{z + k/N) - P{z))^Pi,i+kPi+k,i , (74) 

\k\<VN 

where z := i/N and /?(x) := (log6(x))/2. By hypothesis, /3 G Cf([0, 1]). 
Hence f3{z + k/N) - f3{z) = (3\z)k/N + 0{l/N) for \k\ < VN, so that the 
last expression takes the form 

(l + o(l))6(z) e'^^/^/fWW, (75) 

\k\<VN 



Mutation, selection and ancestry in branching models 



43 



where 5 := I3'{z). Next we can omit the exponential q^^/^ ^ making an error 
of order 1/N only. Indeed, using inequality (65) together with Lemma 1 and 
setting a := — v), b := v/{l — w) we obtain 

keS-i 

<e«+''^(el'=ll*l/^-l)pfc(V^,V^) (76) 

kez 

< e»+^(exp [2x/^(el^l/^ - 1)] - l) = o(^) . 

The second inequality is obtained by taking formula (60) for pk{Vab, Vab), 
using el'^ll'^l/^ < c'^m/NQm\s\/N ^ summing up. If z = i/N is bounded 
away from and 1, we can finally apply Prop. 4, Lemma 2 and the identity 
(63) to obtain 



^/ BiBi+kPi,i+kPi+k,i 

keS-i 

= + + S VPk{z)p-k{z) (77) 

\k\<VN 

= 6(z)e-«W+0(l). 

Taking this together with the assumed approximation of the Di, we arrive 
at the approximation (Al) of E, and the diagonal elements of F, by the 
functions e and fk defined in (55) and (58). But the approximation of the 
nondiagonal elements of F is also guaranteed, since the estimates leading 
to (77) all hold term by term. This completes the proof of (Al). 

Next, condition (A2) follows directly from Lemma 1 and the fact that 
Poisson distributions and their convolutions have a finite third moment. We 
note further that the upper bound on Fij and the lower bound on Ei in (A3) 
and (A3') arc obvious. 

Before turning to the upper bound on Ei let us discuss the particular 
context of Theorems 3, 4 and 5. We observe that the function g is continuous 
on [0, 1] and smooth on ]0, 1[ with g'{0) = —g'{l) = — oo, while 6 > and d 
are functions on [0, 1]. This entails that e is on ]0, 1[ and attains its 
absolute maximum at a point z* e]0, 1[; in particular, z* is contained in an 
interval A satisfying (Al) and (A2), as is required for Thm. 3. In addition, 
e" is negative in a neigbourhood of and 1, and has only finitely many 
zeroes by assumption. This implies that each basin of e is determined by 
smooth hills, as is necessary for applying Thm. 5. 

Now let A' be any set of exposed smoothness points of e which is bounded 
away from and 1. If 5 > is sufficiently small, the set A^ of all y satisfying 
e(y) > tz{y) — S for all z G A' is still bounded away from and 1. For all i 
with i/N e A^, the upper bound on Ei in (A3') follows directly from (77). 
For all other i's, this bound follows from (73)-(76) as soon as A'' is so large 
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that the C'(l/\/iV)-term in Prop. 5 is less than S/maxb. This completes 
the proof of (A3') under the conditions of Thms. 4 and 5. Since ]0, 1[ sphts 
into finitely many intervals forming basins and smooth hills of e, the stated 
approximation result for A follows. Finally, as z* is also an exposed point, 
the choice A' = {z*} gives us the upper bound on Ei in (A3). Thm. 3 can 
therefore be applied, proving the approximation of A as stated. □ 
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